AITopics | genomic data

Collaborating Authors

genomic data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Deep Learning-Based Computer Vision Models for Early Cancer Detection Using Multimodal Medical Imaging and Radiogenomic Integration Frameworks

Oghenekaro, Emmanuella Avwerosuoghene

arXiv.org Artificial IntelligenceDec-2-2025

Early cancer detection remains one of the most critical challenges in modern healthcare, where delayed diagnosis significantly reduces survival outcomes. Recent advancements in artificial intelligence, particularly deep learning, have enabled transformative progress in medical imaging analysis. Deep learning-based computer vision models, such as convolutional neural networks (CNNs), transformers, and hybrid attention architectures, can automatically extract complex spatial, morphological, and temporal patterns from multimodal imaging data including MRI, CT, PET, mammography, histopathology, and ultrasound. These models surpass traditional radiological assessment by identifying subtle tissue abnormalities and tumor microenvironment variations invisible to the human eye. At a broader scale, the integration of multimodal imaging with radiogenomics linking quantitative imaging features with genomics, transcriptomics, and epigenetic biomarkers has introduced a new paradigm for personalized oncology. This radiogenomic fusion allows the prediction of tumor genotype, immune response, molecular subtypes, and treatment resistance without invasive biopsies.

artificial intelligence, international journal, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.7753/IJCATR1411.1001

2512.00714

Country: North America > United States (1.00)

Genre: Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.49)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Copula Based Fusion of Clinical and Genomic Machine Learning Risk Scores for Breast Cancer Risk Stratification

Aich, Agnideep, Hewage, Sameera, Murshed, Md Monzur

arXiv.org Machine LearningNov-25-2025

Clinical and genomic models are both used to predict breast cancer outcomes, but they are often combined using simple linear rules that do not account for how their risk scores relate, especially at the extremes. Using the METABRIC breast cancer cohort, we studied whether directly modeling the joint relationship between clinical and genomic machine learning risk scores could improve risk stratification for 5-year cancer-specific mortality. We created a binary 5-year cancer-death outcome and defined two sets of predictors: a clinical set (demographic, tumor, and treatment variables) and a genomic set (gene-expression $z$-scores). We trained several supervised classifiers, such as Random Forest and XGBoost, and used 5-fold cross-validated predicted probabilities as unbiased risk scores. These scores were converted to pseudo-observations on $(0,1)^2$ to fit Gaussian, Clayton, and Gumbel copulas. Clinical models showed good discrimination (AUC 0.783), while genomic models had moderate performance (AUC 0.681). The joint distribution was best captured by a Gaussian copula (bootstrap $p=0.997$), which suggests a symmetric, moderately strong positive relationship. When we grouped patients based on this relationship, Kaplan-Meier curves showed clear differences: patients who were high-risk in both clinical and genomic scores had much poorer survival than those high-risk in only one set. These results show that copula-based fusion works in real-world cohorts and that considering dependencies between scores can better identify patient subgroups with the worst prognosis.

copula, dependence, risk score, (15 more...)

arXiv.org Machine Learning

2511.17605

Country:

North America > United States > New York (0.04)
North America > United States > Minnesota > Blue Earth County > Mankato (0.04)
North America > United States > Louisiana > Lafayette Parish > Lafayette (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

GQVis: A Dataset of Genomics Data Questions and Visualizations for Generative AI

Walters, Skylar Sargent, Valderrama, Arthea, Smits, Thomas C., Kouřil, David, Nguyen, Huyen N., L'Yi, Sehi, Lange, Devin, Gehlenborg, Nils

arXiv.org Artificial IntelligenceOct-17-2025

Data visualization is a fundamental tool in genomics research, enabling the exploration, interpretation, and communication of complex genomic features. While machine learning models show promise for transforming data into insightful visualizations, current models lack the training foundation for domain-specific tasks. In an effort to provide a foundational resource for genomics-focused model training, we present a framework for generating a dataset that pairs abstract, low-level questions about genomics data with corresponding visualizations. Building on prior work with statistical plots, our approach adapts to the complexity of genomics data and the specialized representations used to depict them. We further incorporate multiple linked queries and visualizations, along with justifications for design choices, figure captions, and image alt-texts for each item in the dataset. We use genomics data retrieved from three distinct genomics data repositories (4DN, ENCODE, Chromoscope) to produce GQVis: a dataset consisting of 1.14 million single-query data points, 628k query pairs, and 589k query chains. The GQVis dataset and generation code are available at https://huggingface.co/datasets/HIDIVE/GQVis and https://github.com/hms-dbmi/GQVis-Generation.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.13816

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.41)

Add feedback

ME-Mamba: Multi-Expert Mamba with Efficient Knowledge Capture and Fusion for Multimodal Survival Analysis

Zhang, Chengsheng, Qu, Linhao, Liu, Xiaoyu, Song, Zhijian

arXiv.org Artificial IntelligenceSep-23-2025

Survival analysis using whole-slide images (WSIs) is crucial in cancer research. Despite significant successes, pathology images typically only provide slide-level labels, which hinders the learning of discriminative representations from gigapixel WSIs. With the rapid advancement of high-throughput sequencing technologies, multimodal survival analysis integrating pathology images and genomics data has emerged as a promising approach. We propose a Multi-Expert Mamba (ME-Mamba) system that captures discriminative pathological and genomic features while enabling efficient integration of both modalities. This approach achieves complementary information fusion without losing critical information from individual modalities, thereby facilitating accurate cancer survival analysis. Specifically, we first introduce a Pathology Expert and a Genomics Expert to process unimodal data separately. Both experts are designed with Mamba architectures that incorporate conventional scanning and attention-based scanning mechanisms, allowing them to extract discriminative features from long instance sequences containing substantial redundant or irrelevant information. Second, we design a Synergistic Expert responsible for modality fusion. It explicitly learns token-level local correspondences between the two modalities via Optimal Transport, and implicitly enhances distribution consistency through a global cross-modal fusion loss based on Maximum Mean Discrepancy. The fused feature representations are then passed to a mamba backbone for further integration. Through the collaboration of the Pathology Expert, Genomics Expert, and Synergistic Expert, our method achieves stable and accurate survival analysis with relatively low computational complexity. Extensive experimental results on five datasets in The Cancer Genome Atlas (TCGA) demonstrate our state-of-the-art performance.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.169

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Multi-Disease Deep Learning Framework for GWAS: Beyond Feature Selection Constraints

Farooq, Iqra, Atito, Sara, Demirkan, Ayse, Prokopenko, Inga, Rana, Muhammad

arXiv.org Artificial IntelligenceJul-8-2025

Traditional GWAS has advanced our understanding of complex diseases but often misses nonlinear genetic interactions. Deep learning offers new opportunities to capture complex genomic patterns, yet existing methods mostly depend on feature selection strategies that either constrain analysis to known pathways or risk data leakage when applied across the full dataset. Further, covariates can inflate predictive performance without reflecting true genetic signals. We explore different deep learning architecture choices for GWAS and demonstrate that careful architectural choices can outperform existing methods under strict no-leakage conditions. Building on this, we extend our approach to a multi-label framework that jointly models five diseases, leveraging shared genetic architecture for improved efficiency and discovery. Applied to five million SNPs across 37,000 samples, our method achieves competitive predictive performance (AUC 0.68-0.96),

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2507.05247

Genre: Research Report > Experimental Study (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.51)
Health & Medicine > Therapeutic Area > Endocrinology (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Uncertainty-Aware Genomic Classification of Alzheimer's Disease: A Transformer-Based Ensemble Approach with Monte Carlo Dropout

Jo, Taeho, Lee, Eun Hye, Project, Alzheimer's Disease Sequencing

arXiv.org Artificial IntelligenceJun-3-2025

INTRODUCTION: Alzheimer's disease (AD) is genetically complex, complicating robust classification from genomic data. METHODS: We developed a transformer-based ensemble model (TrUE-Net) using Monte Carlo Dropout for uncertainty estimation in AD classification from whole-genome sequencing (WGS). We combined a transformer that preserves single-nucleotide polymorphism (SNP) sequence structure with a concurrent random forest using flattened genotypes. An uncertainty threshold separated samples into an uncertain (high-variance) group and a more certain (low-variance) group. RESULTS: We analyzed 1050 individuals, holding out half for testing. Overall accuracy and area under the receiver operating characteristic (ROC) curve (AUC) were 0.6514 and 0.6636, respectively. Excluding the uncertain group improved accuracy from 0.6263 to 0.7287 (10.24% increase) and F1 from 0.5843 to 0.8205 (23.62% increase). DISCUSSION: Monte Carlo Dropout-driven uncertainty helps identify ambiguous cases that may require further clinical evaluation, thus improving reliability in AD genomic classification.

alzheimer, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.00662

Country: North America > United States > Indiana (0.15)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

Radiogenomic Bipartite Graph Representation Learning for Alzheimer's Disease Detection

Raj, Aditya, Mirzaei, Golrokh

arXiv.org Artificial IntelligenceMay-16-2025

Imaging and genomic data offer distinct and rich features, and their integration can unveil new insights into the complex landscape of diseases. In this study, we present a novel approach utilizing radiogenomic data including structural MRI images and gene expression data, for Alzheimer's disease detection. Our framework introduces a novel heterogeneous bipartite graph representation learning featuring two distinct node types: genes and images. The network can effectively classify Alzheimer's disease (AD) into three distinct stages:AD, Mild Cognitive Impairment (MCI), and Cognitive Normal (CN) classes, utilizing a small dataset. Additionally, it identified which genes play a significant role in each of these classification groups. We evaluate the performance of our approach using metrics including classification accuracy, recall, precision, and F1 score. The proposed technique holds potential for extending to radiogenomic-based classification to other diseases.

alzheimer, artificial intelligence, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2505.09848

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
Europe > Greece > Attica > Athens (0.04)

Genre:

Research Report > New Finding (0.89)
Research Report > Promising Solution (0.66)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Securing Genomic Data Against Inference Attacks in Federated Learning Environments

Pathade, Chetan, Patil, Shubham

arXiv.org Artificial IntelligenceMay-13-2025

Federated Learning (FL) offers a promising framework for collaboratively training machine learning models across decentralized genomic datasets without direct data sharing. While this approach preserves data locality, it remains susceptible to sophisticated inference attacks that can compromise individual privacy. In this study, we simulate a federated learning setup using synthetic genomic data and assess its vulnerability to three key attack vectors: Membership Inference Attack (MIA), Gradient-Based Membership Inference Attack, and Label Inference Attack (LIA). Our experiments reveal that Gradient-Based MIA achieves the highest effectiveness, with a precision of 0.79 and F1-score of 0.87, underscoring the risk posed by gradient exposure in federated updates. Additionally, we visualize comparative attack performance through radar plots and quantify model leakage across clients. The findings emphasize the inadequacy of naïve FL setups in safeguarding genomic privacy and motivate the development of more robust privacy-preserving mechanisms tailored to the unique sensitivity of genomic data.

artificial intelligence, inference attack, machine learning, (10 more...)

arXiv.org Artificial Intelligence

2505.07188

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

Gene42: Long-Range Genomic Foundation Model With Dense Attention

Vishniakov, Kirill, Amor, Boulbaba Ben, Tekin, Engin, ElNaker, Nancy A., Viswanathan, Karthik, Medvedev, Aleksandr, Singh, Aahan, Nadeem, Maryam, Sayeed, Mohammad Amaan, Kanithi, Praveenkumar, Magalhaes, Tiago, Vassilieva, Natalia, Mahapatra, Dwarikanath, Pimentel, Marco, Khan, and Shadab

arXiv.org Artificial IntelligenceMar-20-2025

We introduce Gene42, a novel family of Genomic Foundation Models (GFMs) designed to manage context lengths of up to 192,000 base pairs (bp) at a single-nucleotide resolution. Gene42 models utilize a decoder-only (LLaMA-style) architecture with a dense self-attention mechanism. Initially trained on fixed-length sequences of 4,096 bp, our models underwent continuous pretraining to extend the context length to 192,000 bp. This iterative extension allowed for the comprehensive processing of large-scale genomic data and the capture of intricate patterns and dependencies within the human genome. Gene42 is the first dense attention model capable of handling such extensive long context lengths in genomics, challenging state-space models that often rely on convolutional operators among other mechanisms. Our pretrained models exhibit notably low perplexity values and high reconstruction accuracy, highlighting their strong ability to model genomic data. Extensive experiments on various genomic benchmarks have demonstrated state-of-the-art performance across multiple tasks, including biotype classification, regulatory region identification, chromatin profiling prediction, variant pathogenicity prediction, and species classification. The models are publicly available at huggingface.co/inceptionai.

bioinformatics, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.16565

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > California > Santa Clara County > Sunnyvale (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Biomedical Informatics > Translational Bioinformatics (0.70)

Add feedback

HySurvPred: Multimodal Hyperbolic Embedding with Angle-Aware Hierarchical Contrastive Learning and Uncertainty Constraints for Survival Prediction

Yang, Jiaqi, Chen, Wenting, Xing, Xiaohan, He, Sean, Luo, Xiaoling, Lyu, Xinheng, Shen, Linlin, Qiu, Guoping

arXiv.org Artificial IntelligenceMar-17-2025

Multimodal learning that integrates histopathology images and genomic data holds great promise for cancer survival prediction. However, existing methods face key limitations: 1) They rely on multimodal mapping and metrics in Euclidean space, which cannot fully capture the hierarchical structures in histopathology (among patches from different resolutions) and genomics data (from genes to pathways). 2) They discretize survival time into independent risk intervals, which ignores its continuous and ordinal nature and fails to achieve effective optimization. 3) They treat censorship as a binary indicator, excluding censored samples from model optimization and not making full use of them. To address these challenges, we propose HySurvPred, a novel framework for survival prediction that integrates three key modules: Multimodal Hyperbolic Mapping (MHM), Angle-aware Ranking-based Contrastive Loss (ARCL) and Censor-Conditioned Uncertainty Constraint (CUC). Instead of relying on Euclidean space, we design the MHM module to explore the inherent hierarchical structures within each modality in hyperbolic space. To better integrate multimodal features in hyperbolic space, we introduce the ARCL module, which uses ranking-based contrastive learning to preserve the ordinal nature of survival time, along with the CUC module to fully explore the censored data. Extensive experiments demonstrate that our method outperforms state-of-the-art methods on five benchmark datasets. The source code is to be released.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2503.13862

Country:

Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
Asia > China > Zhejiang Province > Ningbo (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > Experimental Study (0.47)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback